83 research outputs found
Mats: MultiCore Adaptive Trace Selection
Dynamically optimizing programs is worthwhile only if the overhead created by the dynamic optimizer is less than the benefit gained from the optimization. Program trace selection is one of the most important, yet time consuming, components of many dynamic optimizers. The dynamic application of monitoring and profiling can often result in an execution slowdown rather than speedup. Achieving significant performance gain from dynamic optimization has proven to be quite challenging. However, current technological advances, namely multicore architectures, enable us to design new approaches to meet this challenge. Selecting traces in current dynamic optimizers is typically achieved through the use of instrumentation to collect control flow information from a running application. Using instrumentation for runtime analysis requires the trace selection algorithms to be light weight, and this limits how sophisticated these algorithms can be. This is problematic because the quality of the traces can determine the potential benefits that can be gained from optimizing the traces. In many cases, even when using a lightweight approach, the overhead incurred is more than the benefit of the optimizations. In this paper we exploit the multicore architecture to design an aggressive trace selection approach that produces better traces and does not perturb the running application. 1
Learned Interferometric Imaging for the SPIDER Instrument
The Segmented Planar Imaging Detector for Electro-Optical Reconnaissance
(SPIDER) is an optical interferometric imaging device that aims to offer an
alternative to the large space telescope designs of today with reduced size,
weight and power consumption. This is achieved through interferometric imaging.
State-of-the-art methods for reconstructing images from interferometric
measurements adopt proximal optimization techniques, which are computationally
expensive and require handcrafted priors. In this work we present two
data-driven approaches for reconstructing images from measurements made by the
SPIDER instrument. These approaches use deep learning to learn prior
information from training data, increasing the reconstruction quality, and
significantly reducing the computation time required to recover images by
orders of magnitude. Reconstruction time is reduced to
milliseconds, opening up the possibility of real-time imaging with SPIDER for
the first time. Furthermore, we show that these methods can also be applied in
domains where training data is scarce, such as astronomical imaging, by
leveraging transfer learning from domains where plenty of training data are
available.Comment: 21 pages, 14 figure
BlockChop: Dynamic Squash Elimination for Hybrid Processor Architecture
Abstract Hybrid processors are HW/SW co-designed processors that leverage blocked-execution, the execution of regions of instructions as atomic blocks, to facilitate aggressive speculative optimization. As we move to a multicore hybrid design, fine grained conflicts for shared data can violate the atomicity requirement of these blocks and lead to expensive squashes and rollbacks. However, as these atomic regions differ from those used in checkpointing and transactional memory systems, the extent of this potentially prohibitive problem remains unclear, and mechanisms to mitigate these squashes dynamically may be critical to enable a highly performant multicore hybrid design. In this work, we investigate how multithreaded applications, both benchmark and commercial workloads, are affected by squashes, and present dynamic mechanisms for mitigating these squashes in hybrid processors. While the current wisdom is that there is not a significant number of squashes for smaller atomic regions, we observe this is not the case for many multithreaded workloads. With region sizes of just 200 -500 instructions, we observe a performance degradation ranging from 10% to more than 50% for workloads with a mixture of shared reads and writes. By harnessing the unique flexibility provided by the software subsystem of hybrid processor design, we present BlockChop, a framework for dynamically mitigating squashes on multicore hybrid processors. We present a range of squash handling mechanisms leveraging retrials, interpretation, and retranslation, and find that BlockChop is quite effective. Over the current response to exceptions and squashes in a hybrid design, we are able to improve the performance of benchmark and commercial workloads by 1.4x and 1.2x on average for large and small region sizes respectively
Rule By Example: Harnessing Logical Rules for Explainable Hate Speech Detection
Classic approaches to content moderation typically apply a rule-based
heuristic approach to flag content. While rules are easily customizable and
intuitive for humans to interpret, they are inherently fragile and lack the
flexibility or robustness needed to moderate the vast amount of undesirable
content found online today. Recent advances in deep learning have demonstrated
the promise of using highly effective deep neural models to overcome these
challenges. However, despite the improved performance, these data-driven models
lack transparency and explainability, often leading to mistrust from everyday
users and a lack of adoption by many platforms. In this paper, we present Rule
By Example (RBE): a novel exemplar-based contrastive learning approach for
learning from logical rules for the task of textual content moderation. RBE is
capable of providing rule-grounded predictions, allowing for more explainable
and customizable predictions compared to typical deep learning-based
approaches. We demonstrate that our approach is capable of learning rich rule
embedding representations using only a few data examples. Experimental results
on 3 popular hate speech classification datasets show that RBE is able to
outperform state-of-the-art deep learning classifiers as well as the use of
rules in both supervised and unsupervised settings while providing explainable
model predictions via rule-grounding.Comment: ACL 2023 Main Conferenc
Fast emulation of anisotropies induced in the cosmic microwave background by cosmic strings
Cosmic strings are linear topological defects that may have been produced
during symmetry-breaking phase transitions in the very early Universe. In an
expanding Universe the existence of causally separate regions prevents such
symmetries from being broken uniformly, with a network of cosmic string
inevitably forming as a result. To faithfully generate observables of such
processes requires computationally expensive numerical simulations, which
prohibits many types of analyses. We propose a technique to instead rapidly
emulate observables, thus circumventing simulation. Emulation is a form of
generative modelling, often built upon a machine learning backbone. End-to-end
emulation often fails due to high dimensionality and insufficient training
data. Consequently, it is common to instead emulate a latent representation
from which observables may readily be synthesised. Wavelet phase harmonics are
an excellent latent representations for cosmological fields, both as a summary
statistic and for emulation, since they do not require training and are highly
sensitive to non-Gaussian information. Leveraging wavelet phase harmonics as a
latent representation, we develop techniques to emulate string induced CMB
anisotropies over a 7.2 degree field of view, with sub-arcminute resolution, in
under a minute on a single GPU. Beyond generating high fidelity emulations, we
provide a technique to ensure these observables are distributed correctly,
providing a more representative ensemble of samples. The statistics of our
emulations are commensurate with those calculated on comprehensive Nambu-Goto
simulations. Our findings indicate these fast emulation approaches may be
suitable for wide use in, e.g., simulation based inference pipelines. We make
our code available to the community so that researchers may rapidly emulate
cosmic string induced CMB anisotropies for their own analysis
Cross-species neuroscience: closing the explanatory gap
Neuroscience has seen substantial development in non-invasive methods available for investigating the living human brain. However, these tools are limited to coarse macroscopic measures of neural activity that aggregate the diverse responses of thousands of cells. To access neural activity at the cellular and circuit level, researchers instead rely on invasive recordings in animals. Recent advances in invasive methods now permit large-scale recording and circuit level manipulations with exquisite spatiotemporal precision. Yet, there has been limited progress in relating these microcircuit measures to complex cognition and behaviour observed in humans. Contemporary neuroscience thus faces an explanatory gap between macroscopic descriptions of the human brain and microscopic descriptions in animal models. To close the explanatory gap, we propose adopting a cross-species approach. Despite dramatic differences in the size of mammalian brains this approach is broadly justified by preserved homology. Here, we outline a three-armed approach for effective cross-species investigation that highlights the need to translate different measures of neural activity into a common space. We discuss how a cross-species approach has the potential to transform basic neuroscience while also benefiting neuropsychiatric drug development where clinical translation has, to date, seen minimal success
- …